S&P 500 Index Time Series EDA

Author

C. Roberts/R. J. Serrano

Source: Exploratory Data Analysis of Time Series Data

Read the S&P 500 Index Adjusted Closing, Daily return (%) and volatility dataset.

spx <- read_rds('../data/spx_return_vol_tbl.rds')

spx
# A tibble: 2,526 × 4
   date       adjusted   return volatility
   <date>        <dbl>    <dbl>      <dbl>
 1 2013-01-03    1459. -0.209      0.209  
 2 2013-01-04    1466.  0.487      0.487  
 3 2013-01-07    1462. -0.312      0.312  
 4 2013-01-08    1457. -0.324      0.324  
 5 2013-01-09    1461.  0.266      0.266  
 6 2013-01-10    1472.  0.760      0.760  
 7 2013-01-11    1472. -0.00475    0.00475
 8 2013-01-14    1471. -0.0931     0.0931 
 9 2013-01-15    1472.  0.113      0.113  
10 2013-01-16    1473.  0.0197     0.0197 
# … with 2,516 more rows

Plot time series

spx %>% 
     select(date, adjusted) %>% 
     plot_time_series(date, adjusted, 
                      .title = 'S&P 500 Index Daily Adjusted Closing Price (Jan 2013 - Jan 2023 (partial)')
spx %>% 
     plot_time_series(date, return, 
                      .title = 'S&P 500 Index Daily Adjusted Closing Return Percentage (Jan 2013 - Jan 2023 (partial))')
spx %>% 
     plot_time_series(date, volatility, 
                      .title = 'S&P 500 Index Daily Volatility Percentage (Jan 2013 - Jan 2023 (partial))')

ACF/PACF Diagnostics

spx %>% 
     plot_acf_diagnostics(date, adjusted)
spx %>% 
     plot_acf_diagnostics(date, return)
spx %>% 
     plot_acf_diagnostics(date, volatility)

Seasonal Diagnostics

spx %>% 
     plot_seasonal_diagnostics(
          .date_var = date, 
          .value = adjusted
     )
spx %>% 
     plot_seasonal_diagnostics(
          .date_var = date, 
          .value = return
     )
spx %>% 
     plot_seasonal_diagnostics(
          .date_var = date, 
          .value = volatility
     )

Anomaly Diagnostics

spx %>% 
     plot_anomaly_diagnostics(
          .date_var = date, 
          .value = adjusted, 
          .alpha = 0.05, 
          .max_anomalies = 0.03
     )
frequency = 5 observations per 1 week
trend = 64 observations per 3 months
spx %>% 
     plot_anomaly_diagnostics(
          .date_var = date, 
          .value = return, 
          .alpha = 0.05, 
          .max_anomalies = 0.03
     )
frequency = 5 observations per 1 week
trend = 64 observations per 3 months
spx %>% 
     plot_anomaly_diagnostics(
          .date_var = date, 
          .value = volatility, 
          .alpha = 0.05, 
          .max_anomalies = 0.03
     )
frequency = 5 observations per 1 week
trend = 64 observations per 3 months

Seasonal Decomposition

spx %>% 
     plot_stl_diagnostics(
          .date_var = date, 
          .value = adjusted
     )
frequency = 5 observations per 1 week
trend = 64 observations per 3 months
spx %>% 
     plot_stl_diagnostics(
          .date_var = date, 
          .value = return
     )
frequency = 5 observations per 1 week
trend = 64 observations per 3 months
spx %>% 
     plot_stl_diagnostics(
          .date_var = date, 
          .value = volatility
     )
frequency = 5 observations per 1 week
trend = 64 observations per 3 months

Heteroskedasticity (variance not uniform across the time series) test

Using bptest from the lmtest package

Hypothesis test:

  • Null hypothesis (H0): Time series variance is uniform

  • Alternate hypothesis (Ha): Time series variance is not uniform

lm_model_adj <- lm(adjusted ~ as.numeric(date), data = spx)

bptest(lm_model_adj, data = spx)

    studentized Breusch-Pagan test

data:  lm_model_adj
BP = 469.1, df = 1, p-value < 2.2e-16

Since the p-value < 0.05, we can reject the null hypothesis in favor of the alternate hypothesis, i.e., their is significant evidence that the time series variance is not uniform (may require transformation).

lm_model_ret <- lm(return ~ as.numeric(date), data = spx)

bptest(lm_model_ret, data = spx)

    studentized Breusch-Pagan test

data:  lm_model_ret
BP = 35.784, df = 1, p-value = 2.205e-09

Since the p-value < 0.05, we can reject the null hypothesis in favor of the alternate hypothesis, i.e., their is significant evidence that the time series variance is not uniform (may require transformation).

lm_model_vol <- lm(volatility ~ as.numeric(date), data = spx)

bptest(lm_model_ret, data = spx)

    studentized Breusch-Pagan test

data:  lm_model_ret
BP = 35.784, df = 1, p-value = 2.205e-09

Since the p-value < 0.05, we can reject the null hypothesis in favor of the alternate hypothesis, i.e., their is significant evidence that the time series variance is not uniform (may require transformation).

Stationarity Test

What is the definition of a stationary time series?

According to the textbook Chapter 8.1 - Stationarity and differencing, “a stationary time series is one whose properties do not depend on the time at which the series is observed. Thus, time series with trends, or with seasonality, are not stationary — the trend and seasonality will affect the value of the time series at different times.”

Is there a test to detect time series stationarity?

Yes. The traditional test is the ADF (Augmented Dick Fuller) test.

Hypothesis test:

  • Null hypothesis (H0): Time series is non-stationary

  • Alternate hypothesis (Ha): Time series is stationary

Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
adf.test(spx$adjusted)

    Augmented Dickey-Fuller Test

data:  spx$adjusted
Dickey-Fuller = -2.6125, Lag order = 13, p-value = 0.319
alternative hypothesis: stationary

Since the p-value > 0.05, we cannot reject the null hypothesis. Therefore, the time series is non-stationary.

adf.test(spx$return)
Warning in adf.test(spx$return): p-value smaller than printed p-value

    Augmented Dickey-Fuller Test

data:  spx$return
Dickey-Fuller = -13.849, Lag order = 13, p-value = 0.01
alternative hypothesis: stationary

Since the p-value < 0.05, we can reject the null hypothesis in favor of the alternate hypothesis, i.e., the time series is stationary.

adf.test(spx$volatility)
Warning in adf.test(spx$volatility): p-value smaller than printed p-value

    Augmented Dickey-Fuller Test

data:  spx$volatility
Dickey-Fuller = -7.5903, Lag order = 13, p-value = 0.01
alternative hypothesis: stationary

Since the p-value < 0.05, we can reject the null hypothesis in favor of the alternate hypothesis, i.e., the time series is stationary.